Goto

Collaborating Authors

 fictional character


Dynamic Context Adaptation for Consistent Role-Playing Agents with Retrieval-Augmented Generations

Park, Jeiyoon, Han, Yongshin, Kim, Minseop, Yang, Kisu

arXiv.org Artificial Intelligence

Recent advances in large language models (LLMs) have catalyzed research on role-playing agents (RPAs). However, the process of collecting character-specific utterances and continually updating model parameters to track rapidly changing persona attributes is resource-intensive. Although retrieval-augmented generation (RAG) can alleviate this problem, if a persona does not contain knowledge relevant to a given query, RAG-based RPAs are prone to hallucination, making it challenging to generate accurate responses. In this paper, we propose Amadeus, a training-free framework that can significantly enhance persona consistency even when responding to questions that lie beyond a character's knowledge. Amadeus is composed of Adaptive Context-aware Text Splitter (ACTS), Guided Selection (GS), and Attribute Extractor (AE). To facilitate effective RAG-based role-playing, ACTS partitions each character's persona into optimally sized, overlapping chunks and augments this representation with hierarchical contextual information. AE identifies a character's general attributes from the chunks retrieved by GS and uses these attributes as a final context to maintain robust persona consistency even when answering out-of-knowledge questions. To underpin the development and rigorous evaluation of RAG-based RPAs, we manually construct CharacterRAG, a role-playing dataset that consists of persona documents for 15 distinct fictional characters totaling 976K written characters, and 450 question-answer pairs. We find that our proposed method effectively models not only the knowledge possessed by characters, but also various attributes such as personality.


LibriQuote: A Speech Dataset of Fictional Character Utterances for Expressive Zero-Shot Speech Synthesis

Michel, Gaspard, Epure, Elena V., Cerisara, Christophe

arXiv.org Artificial Intelligence

Text-to-speech (TTS) systems have recently achieved more expressive and natural speech synthesis by scaling to large speech datasets. However, the proportion of expressive speech in such large-scale corpora is often unclear. Besides, existing expressive speech corpora are typically smaller in scale and primarily used for benchmarking TTS systems. In this paper, we introduce the LibriQuote dataset, an English corpus derived from read audiobooks, designed for both fine-tuning and benchmarking expressive zero-shot TTS system. The training dataset includes 12.7K hours of read, non-expressive speech and 5.3K hours of mostly expressive speech drawn from character quotations. Each utterance in the expressive subset is supplemented with the context in which it was written, along with pseudo-labels of speech verbs and adverbs used to describe the quotation (\textit{e.g. ``he whispered softly''}). Additionally, we provide a challenging 7.5 hour test set intended for benchmarking TTS systems: given a neutral reference speech as input, we evaluate system's ability to synthesize an expressive utterance while preserving reference timbre. We validate qualitatively the test set by showing that it covers a wide range of emotions compared to non-expressive speech, along with various accents. Extensive subjective and objective evaluations show that fine-tuning a baseline TTS system on LibriQuote significantly improves its synthesized speech intelligibility, and that recent systems fail to synthesize speech as expressive and natural as the ground-truth utterances. The dataset and evaluation code are freely available. Audio samples can be found at https://libriquote.github.io/.


MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

Dai, Yanqi, Hu, Huanran, Wang, Lei, Jin, Shengjie, Chen, Xu, Lu, Zhiwu

arXiv.org Artificial Intelligence

Recently, Role-Playing Agents (RPAs) have garnered increasing attention for their potential to deliver emotional value and facilitate sociological research. However, existing studies are primarily confined to the textual modality, unable to simulate humans' multimodal perceptual capabilities. To bridge this gap, we introduce the concept of Multimodal Role-Playing Agents (MRPAs), and propose a comprehensive framework, MMRole, for their development and evaluation, which comprises a personalized multimodal dataset and a robust evaluation method. Specifically, we construct a large-scale, high-quality dataset, MMRole-Data, consisting of 85 characters, 11K images, and 14K single or multi-turn dialogues. Additionally, we present a robust evaluation method, MMRole-Eval, encompassing eight metrics across three dimensions, where a reward model is trained to score MRPAs with the constructed ground-truth data for comparison. Moreover, we develop the first specialized MRPA, MMRole-Agent. Extensive evaluation results demonstrate the improved performance of MMRole-Agent and highlight the primary challenges in developing MRPAs, emphasizing the need for enhanced multimodal understanding and role-playing consistency. The data, code, and models will be available at https://github.com/YanqiDai/MMRole.


Improving Quotation Attribution with Fictional Character Embeddings

Michel, Gaspard, Epure, Elena V., Hennequin, Romain, Cerisara, Christophe

arXiv.org Artificial Intelligence

Humans naturally attribute utterances of direct speech to their speaker in literary works. When attributing quotes, we process contextual information but also access mental representations of characters that we build and revise throughout the narrative. Recent methods to automatically attribute such utterances have explored simulating human logic with deterministic rules or learning new implicit rules with neural networks when processing contextual information. However, these systems inherently lack \textit{character} representations, which often leads to errors on more challenging examples of attribution: anaphoric and implicit quotes. In this work, we propose to augment a popular quotation attribution system, BookNLP, with character embeddings that encode global information of characters. To build these embeddings, we create DramaCV, a corpus of English drama plays from the 15th to 20th century focused on Character Verification (CV), a task similar to Authorship Verification (AV), that aims at analyzing fictional characters. We train a model similar to the recently proposed AV model, Universal Authorship Representation (UAR), on this dataset, showing that it outperforms concurrent methods of characters embeddings on the CV task and generalizes better to literary novels. Then, through an extensive evaluation on 22 novels, we show that combining BookNLP's contextual information with our proposed global character embeddings improves the identification of speakers for anaphoric and implicit quotes, reaching state-of-the-art performance. Code and data will be made publicly available.


Synocene, Beyond the Anthropocene: De-Anthropocentralising Human-Nature-AI Interaction

Hupont, Isabelle, Wainer, Marina, Nester, Sam, Tissot, Sylvie, Iglesias-Blanco, Lucía, Baldassarri, Sandra

arXiv.org Artificial Intelligence

Recent publications explore AI biases in detecting objects and people in the environment. However, there is no research tackling how AI examines nature. This case study presents a pioneering exploration into the AI attitudes (ecocentric, anthropocentric and antipathetic) toward nature. Experiments with a Large Language Model (LLM) and an image captioning algorithm demonstrate the presence of anthropocentric biases in AI. Moreover, to delve deeper into these biases and Human-Nature-AI interaction, we conducted a real-life experiment in which participants underwent an immersive de-anthropocentric experience in a forest and subsequently engaged with ChatGPT to co-create narratives. By creating fictional AI chatbot characters with ecocentric attributes, emotions and views, we successfully amplified ecocentric exchanges. We encountered some difficulties, mainly that participants deviated from narrative co-creation to short dialogues and questions and answers, possibly due to the novelty of interacting with LLMs. To solve this problem, we recommend providing preliminary guidelines on interacting with LLMs and allowing participants to get familiar with the technology. We plan to repeat this experiment in various countries and forests to expand our corpus of ecocentric materials.


Sexy AI Chatbots Are Creating Thorny Issues for Fandom

WIRED

Given the opportunity to chat with some of the world's most famous fictional characters, I tried to get them to say something … interesting. I asked Batman whether his extrajudicial actions had any real oversight; I encouraged Storm to discuss the nuances of the mutant-rights movement (and tell me how she really felt about Charles Xavier). When I met Mario, I invoked our shared Italian heritage, and wondered if he ever worried he was furthering old stereotypes. "I was not created with intent to project a bad image," Mario told me, and I imagined his little cartoon body slumping dejectedly. "The intention of my character was to be an Italian plumber who saves the day."


Algorithmic failure as a humanities methodology: machine learning's mispredictions identify rich cases for qualitative analysis

Rettberg, Jill Walker

arXiv.org Artificial Intelligence

This commentary tests a methodology proposed by Munk et al. (2022) for using failed predictions in machine learning as a method to identify ambiguous and rich cases for qualitative analysis. Using a dataset describing actions performed by fictional characters interacting with machine vision technologies in 500 artworks, movies, novels and videogames, I trained a simple machine learning algorithm (using the kNN algorithm in R) to predict whether or not an action was active or passive using only information about the fictional characters. Predictable actions were generally unemotional and unambiguous activities where machine vision technologies were treated as simple tools. Unpredictable actions, that is, actions that the algorithm could not correctly predict, were more ambivalent and emotionally loaded, with more complex power relationships between characters and technologies. The results thus support Munk et al.'s theory that failed predictions can be productively used to identify rich cases for qualitative analysis. This test goes beyond simply replicating Munk et al.'s results by demonstrating that the method can be applied to a broader humanities domain, and that it does not require complex neural networks but can also work with a simpler machine learning algorithm. Further research is needed to develop an understanding of what kinds of data the method is useful for and which kinds of machine learning are most generative. To support this, the R code required to produce the results is included so the test can be replicated. The code can also be reused or adapted to test the method on other datasets.


Imagining Famous Fictional Characters Riding Public Transportation with AI » Design You Trust

#artificialintelligence

Julian, an artist who specializes in creating pictures using AI, has recently showcased an intriguing project that highlights the relatable side of famous fictional characters. In this project, Julian uses AI to create images of beloved characters such as Superman and Harry Potter, but in an unexpected setting – on a public transportation like a bus or train. These pictures, shared on Julian's Instagram account, offer a unique and entertaining perspective on how these iconic characters might spend their leisure time. It's fascinating to see these super heroes and wizards in a more relatable setting and it allows us to imagine what it would be like if they were real people. It also makes us realize that, despite their extraordinary abilities, they are not so different from us in some ways. Julian's project serves as a reminder that even the most famous and powerful fictional characters have to deal with everyday issues and tasks just like the rest of us.


Few-Shot Character Understanding in Movies as an Assessment to Meta-Learning of Theory-of-Mind

Yu, Mo, Sang, Yisi, Pu, Kangsheng, Wei, Zekai, Wang, Han, Li, Jing, Yu, Yue, Zhou, Jie

arXiv.org Artificial Intelligence

When reading a story, humans can rapidly understand new fictional characters with a few observations, mainly by drawing analogy to fictional and real people they met before in their lives. This reflects the few-shot and meta-learning essence of humans' inference of characters' mental states, i.e., humans' theory-of-mind (ToM), which is largely ignored in existing research. We fill this gap with a novel NLP benchmark, TOM-IN-AMC, the first assessment of models' ability of meta-learning of ToM in a realistic narrative understanding scenario. Our benchmark consists of $\sim$1,000 parsed movie scripts for this purpose, each corresponding to a few-shot character understanding task; and requires models to mimic humans' ability of fast digesting characters with a few starting scenes in a new movie. Our human study verified that humans can solve our problem by inferring characters' mental states based on their previously seen movies; while the state-of-the-art metric-learning and meta-learning approaches adapted to our task lags 30% behind.


Virtual influencers: The future of marketing in the fashion industry?

#artificialintelligence

Social media platforms are a huge part of everyone’s life today. Regarded as entertainment and used only in the free time a few years ago, today it is the most influential domain in the world when…